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speech  recognition  system.  A  series  of  flight  teats  were  conducted  using 
12  participants  and  8  different  flight  maneuvers.  Data  were  collected  with 
the  participant  speaking  50  phonetically  balanced  words  into  the  speech 
recognizer  while  seated  in  the  copilot's  seat  of  a  UH-1H  helicopter  during 
each  of  the  8  flight  maneuvers.  The  results  indicate  that  speech  recogni¬ 
tion  system  accuracy  is  not  affected  by  helicopter  vibration. 
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THE  EFFECT  OF  HELICOPTER  VIBRATION  ON  THE 


ACCURACY  OF  A  VOICE  RECOGNITION  SYSTEM 


INTRODUCTION 

Computer  recognition  of  speech  is  a  potentially  advantageous 
alternative  to  manual  data  input  in  helicopter  cockpits.  The  use  of  voice 
allows  pilots  to  keep  their  hands  on  the  controls  and  their  eyes  focused 
outside  the  cockpit.  This  reduces  manual  and  visual  workload  for  input 
tasks  performed  concurrently  with  flying  tasks  (Dennison  &  Moore,  1985; 
Malkin  &  Christ,  1985a). 

One  of  the  issues  that  must  be  resolved  before  voice  recognition  can 
be  considered  a  viable  technology  for  helicopters  is  the  effect  of 
vibration-induced  changes  in  the  voice  on  the  accuracy  of  a  voice  recogni¬ 
tion  system.  Anecdotal  reports  suggest  that  the  voice  of  helicopter  pilots 
can  sound  "shaky"  during  some  maneuvers.  The  French  Service  Technique  des 
Telecommunications  et  Equipements  Aeronautiques  (STTE)  tested  a  voice 
recognition  system  aboard  French  Navy  and  Air  Force  Puma  helicopters  and 
reported  high  (95X  +  )  recognition  rates  ("France  Completes,"  1986).  No 
details  pertaining  to  the  conditions  of  the  study,  like  vibrations,  were 
reported.  Dennison  (unpublished,  1985)  used  an  8000-pound  capacity 
"vibration  table"  with  a  Bell  Cobra  seat  mounted  to  it  to  expose  subjects 
to  various  levels  of  vertical  vibrations.  Subjects  enrolled  a  50-word 
vocabulary  while  not  vibrating;  then  spoke  the  words  at  baseline  (no 
vibration)  and  all  six  combinations  of  vibration  frequencies  of  6  Uz  and  24 
Hz  and  accelerations  of  .05g,  .075g,  and  .lg.  The  frequencies  represent 

the  one-per-rev  and  the  four-per-rev  harmonics  of  a  four-bladed  rotor 
system.  No  difference  in  recognition  accuracy  was  detected  between 
baseline  and  any  of  the  vibration  conditions.  Cruise,  Denson,  and 
Rajasekaran  (in  press)  obtained  similar  results  using  a  vibration  table 
with  a  different  voice  recognition  system. 

Studies  performed  with  a  vibration  table  allow  precise  control  of 
vibration  level,  but  they  lack  realism.  The  current  study  makes  use  of  an 
Army  UH-1H  helicopter  to  evaluate  the  effects  of  realistic,  in-flight 
vibration  levels.  Other  differences  between  the  studies  are  that  (1)  the 
UH-1H  has  a  two-bladed  rotor  system  while  the  vibration  levels  selected  in 
the  "shaker  table"  studies  simulated  a  four-bladed  rotor  system  and  (2) 
subharmonics  that  are  not  produced  by  the  vibration  table  are  present  in 
flight . 


OBJECTIVE 

The  objective  of  this  study  was  to  conduct  a  flight  test  to  examine  the 
ettect  of  vibration-induced  changes  in  the  voice  on  the  accuracy  of  a  voice 
recognition  system. 


METHOD 


Participants 

Twelve  males  volunteered  to  serve  as  subjects.  Six  of  the 
participants  were  civilian  employees  of  the  Human  Engineering  Laboratory, 
Aberdeen  Proving  Ground,  Maryland.  The  remaining  six  were  U.S.  Army 
aviators . 


Apparatus 

All  data  were  collected  with  participants  seated  in  the  copilot's  seat 
of  a  U.S.  Army  UH-1H  helicopter.  A  B&K  ride  pad  accelerometer  and 
amplifiers  for  each  axis  were  employed  to  record  vibration  data. 
Participants  used  a  noise-canceling  microphone  attached  to  a  binaural 
headset.  A  Votan  VPC-2000  was  integrated  with  an  IBM  PC  XT.  This  was 
repackaged  into  a  flightworthy  unit  by  SCI  Systems.  The  Votan  VPC  2000  is 
a  speaker-dependent  voice  recognition  system  which  requires  a  sample  of  how 
each  user  pronounces  the  utterance  in  a  predetermined  vocabulary  prior  to 
use.  This  is  referred  to  as  training  or  enrolling  the  system.  Each  sample 
is  stored  in  memory  as  a  reference  for  later  comparisons.  When  in  use,  the 
system  recognizes  words  by  comparing  current  utterances  with  the  samples 
stored  in  memory  and  selecting  the  closest  match.  The  acceptance  setting 
for  the  recognizer  was  left  at  the  default  value  of  50  on  a  scale  of  1-255; 
the  gain  was  set  at  2  on  a  scale  of  1-7. 

Participants'  utterances,  vibration  data,  cockpit  noise  levels,  and 
aircraft  communications  were  recorded  using  a  Honeywell  6300E  28-channel 
data  recorder.  Cockpit  noise  levels  were  measured  with  a  B&K  noise  level 
meter.  The  vocabulary  consisted  of  50  words  (see  Appendix)  from  the 
Phonetically  Balanced  Word  List  Speech  Intelligibility  Test,  U.S.  Array  TOP 
1-2-610.  The  occurrence  of  phonemes  in  this  list  is  proportionate  to  their 
occurrence  in  English.  A  tape  loop  of  recorded  UH-1H  noise  was  played  on  a 
Nagra  tape  recorder  through  loudspeakers  placed  in  an  acoustic  reverbera¬ 
tion  chamber.  The  baseline  and  all  in-flight  recognition  data  were 
recorded  on  5.25-inch,  double-sided,  double-density  floppy  disks. 

Procedure 

Noise  can  have  a  detrimental  effect  on  speech  recognition  system 
performance  and  it  is  a  potential  confounding  effect.  Previous  research 
has  shown  that  if  a  speech  recognition  device  is  trained  in  a  quiet 
environment  followed  by  attempts  to  use  it  in  a  noisy  environment,  severe 
degradation  in  performance  can  result.  When  the  device  is  trained  and  used 
in  a  noisy  environment,  recognition  accuracy  is  the  same  as  if  the  device 
had  been  trained  and  used  in  a  quiet  environment  (Kersteen,  1982).  In 
order  to  control  the  effects  of  noise,  the  enrollment  of  the  voice 
recognition  system  was  conducted  in  the  presence  of  helicopter  noise. 


4 


v  w.  v.  v^T  >.T''T':,'i,"‘'.'r ■  ^  ■  *■-  *'• 


I 


i 


u 

« 

.< 

H 


! 

'4 

■4 

\ 

\ 

\ 


i 


,  * 


R 


Training 

Participants  were  trained  and  tested  individually.  Each 
participant  was  briefed  on  the  purpose  of  the  experiment  and  given  general 
instructions  on  its  conduct.  Training  took  place  in  the  acoustic 
reverberation  chamber  at  the  acoustics  laboratory  at  the  Human  Engineering 
Laboratory.  The  participant  was  familiarised  with  the  Votan  system  and 
practiced  using  the  recognizer  with  a  sample  six-word  vocabulary  until  it 
was  evident  that  the  test  procedure  and  use  of  the  voice  recognition  system 
were  clearly  understood.  Next,  the  50-word  vocabulary  was  enrolled  into 
the  voice  recognition  system.  The  words  were  read  from  a  list,  and  the 
list  was  repeated  three  times  as  recommended  by  the  manufacturer.  The 
enrollment  was  accomplished  in  the  presence  of  taped  helicopter  noise. 
Participants  wore  ear  protection  and  headsets  during  the  enrollment. 
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Testing 


The  baseline  (no  vibration)  data  were  also  collected  in  the 
acoustic  reverberation  chamber  in  the  presence  of  taped  helicopter  noise. 
The  participant  read  the  50-word  list.  A  short  beep  was  presented  through 
the  participant's  earphones  as  a  cue  to  say  the  appropriate  word.  The 
words  were  spoken  in  order,  and  the  participant  was  instructed  to  wait  for 
the  beep  before  speaking  the  next  word.  From  there,  the  participant  and 
the  voice  recognition  box  were  moved  to  the  airfield.  Data  trials  were 
accomplished  with  the  participant  seated  in  the  copilot's  seat.  The 
maneuver  conditions  tested  in  the  helicopter  were; 


500-feet  per  minute  climb 
hover-out-ot-ground  effect 
45-degree  bank  turn 
60-knot  level  flight 
110-knot  level  flight 
500  feet  per  minute  descent 
hover-in-g .ound  effect 
ground  idle 


When  the  pilot  signaled  that  he  had  established  the  appropriate 
maneuver,  the  participant  was  signaled  to  begin  reading  the  list.  As  in  -r‘ 

the  baseline  data  collection,  the  participant  was  instructed  to  wait  for 
the  beep  prompt  before  saying  each  word.  Although  it  was  impractical  to 
counterbalance  the  maneuver  conditions,  alternate  presentation  orders  were 
used  to  test  for  ordering  effects.  »’* 


The  vibration  levels  that  occur  in  a  UH-lH  helicopter  under  the 
maneuver  conditions  are  shown  in  Table  1.  These  data  indicate  the 
frequency  in  Hertz  and  the  intensity  in  g's  of  the  first  three  harmonics 
(Laing,  Hepler,  &  Merrill,  1973).  The  data  were  gathered  using  an 
accelerometer  mounted  on  the  pilot's  seat  to  record  whole  body  vibrations. 
An  attempt  was  made  to  record  vibration  data  during  the  maneuver  conditions 
of  the  present  study;  however,  the  data  were  lust. 
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TABLE  1 

Vibration  Levels  from  Laing,  Hepler,  and  Merrill  (1973) 


1st 

Hz/g1 8 

Harmoni c8 

2nd 

Hz/g1 s 

3rd 

Hz/g1 s 

Level  Flight 

7. 3/. 08 

20/. 05 

27/. 04 

Hover 

10/. 03 

17. 5/. 03 

27/. 02 

Climb 

6/. 03 

10/. 08 

22. 5/. 04 

Descent 

5/. 03 

10/. 07 

20/. 04 

Bank 

5/. 06 

10/  .1 

20/. 01 

Grid  Idle 

5/. 03 

17. 5/. 02 

25/. 01 

RESULTS 


Because  two  subject  populations  were  used  (aviators  and  nonaviators), 
the  two  groups  were  factored  into  the  experimental  design.  It  was  expected 
that  nonaviators  may  have  encountered  a  higher  degree  of  stress  during  the 
flight  maneuvers  than  aviators.  Stress  has  been  shown  to  affect  vocal 
output  (Dennison  &  Moore,  1984)  and  could  have  confounded  the  results. 

Two  participants'  data  were  dropped  because  a  large  number  of  their 
utterances  were  not  processed  by  the  recognizer.  This  was  an  intermittent 
problem  that  occurred  with  one  aviator  and  one  nonaviator,  and  it  did  not 
occur  in  all  conditions.  The  software  written  to  automate  the  data 
collection  required  a  slight  (750-millisecond)  pause  after  the  beep  prompt 
during  which  the  recognizer  would  sample  the  utterance.  If  the  participant 
spoke  the  word  too  quickly,  it  would  not  be  sampled  and  recognized  by  the 
processor. 

Raw  data,  in  percent,  for  each  condition  were  converted  using  the 
arcsine  conversion  (Ferguson,  1976)  to  satisfy  the  assumptions  of  the 
Analysis  of  Variance. 
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XI  ■  arcsine  /x 


where 


X  *  raw  %  score 

XI  55  converted  score 

The  raw  data  are  not  reported  because  the  intent  of  this  study  is  to 
compare  performance  between  baseline  and  maneuver  conditions;  the  intent  is 
not  to  compare  the  performance  of  this  recognition  system  with  that  of 
other  systems.  Such  a  comparison  would  constitute  a  misapplication  of  data 
unless  the  performance  of  the  two  systems  had  been  measured  under  the  same 
conditions.  Table  2  shows  the  means  for  each  condition  and  submeans  for 
the  five  aviators  versus  five  nonaviators. 


An  analysis  of  variance  with  repeated  measures  for  the  maneuver 
conditions  was  conducted.  The  nonrepeated  factor  was  aviators  versus 
nonaviators.  No  significant  difference  was  detected  between  the 
recognition  accuracy  of  aviators  versus  nonaviators  over  all  levels  of 
maneuver  conditions.  Likewise,  no  significant  differences  were  detected 
among  the  mean  accuracy  scores  of  any  of  the  maneuver  conditions,  including 
the  baseline  condition.  No  significant  interaction  was  noted;  however,  a 
significant  be  tween-sub jec t s  effect  was  found,  F( 8 , 64 )a3 . 63  ,  p<.01. 
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TABLE  2 


Mean  Transformed  Recognition  Accuracy  Data  in  Percent 
by  Group  and  Condition 


Conditions 


Group 

Cl 

C2 

C3 

C4 

C5 

C6 

C7 

C8 

C9 

Aviator  Means 

60.4 

56.6 

53.5 

57.3 

63.0 

54.1 

53.8 

57.0 

57.0 

Nonaviator  Means 

64.3 

54.7 

51.9 

48.3 

53.3 

55.0 

49.7 

55.9 

56.7 

Grand  Means 

62.4 

55.6 

52.7 

52.8 

58.2 

54.5 

51.7 

56.4 

56.8 

Cl 

C2 

C3 

C4 

C5 


DISCUSSION 

Whether  a  participant  was  an  aviator  or  a  nonaviator  had  no  effect  on 
Che  accuracy  of  recognition  of  his  utterances.  The  possibility  that  this 
variable  confounded  the  results  can,  therefore,  be  ruled  out.  More 
importantly,  there  was  no  difference  in  recognition  accuracy  between  the 
baseline  condition  and  any  of  the  in-flight  maneuver  conditions,  nor  were 
there  any  differences  among  the  maneuver  conditions.  These  findings  agree 
with  those  of  the  previously  cited  studies  (Dennison,  unpublished,  1985; 
Cruise,  Denson,  &  Rajasekaran,  in  press)  conducted  on  "shaker  tables"  in 
that  vibration  caused  no  degradation  of  recognition  system  accuracy. 
Moreover,  these  results  go  beyond  those  obtained  in  simulated  vibration 
conditions  in  providing  real-world  verification  that  vibration  will  not 
impede  the  use  of  voice  recognition  technology  in  helicopters.  Of  interest 
would  be  a  study  that  explores  the  limits  of  vibration  at  which  recognition 
will  be  degraded.  It  may  be  difficult,  however,  to  perform  this  research 
without  endangering  subjects.  The  possibility  of  computer  modeling  of  the 
vocal  system  and  its  response  to  vibration  should  be  explored. 

The  significant  between-sub jects  effect  indicates  that  there  were 
greater  than  chance  differences  among  the  participants’ 
recognition  rates  when  compared  to  each  other.  Other  studies  have  reported 
high  variability  in  recognition  rates  based  on  individual  differences 
(Aretz,  1983;  Malkin,  1983;  Malkin  &  Christ,  1985b) 

This  may  be  one  cf  the  most  urgent  problems  that  users  and  makers  of 
speech  recognition  algorithms  face.  If  speech  recognition  systems  aie  to 
be  acceptable  for  extensive  use  in  Army  helicopter  cockpits,  then 
manufacturers  must  provide  systems  that  recognize  a  wider  range  of  the 
popu la  t i on . 


-  baseline  (no  vibration) 

-  500-feet  per  minute  climb 

-  hover-out-of-ground  effect 

-  45-degree  bank  turn 

-  60-knot  level  flight 


C6  -  110-knot  level  flight 
C7  -  500  feet  per  minute  descent 
C8  -  hover-in-ground  effect 
C9  -  ground  idle 
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A* 
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FRIGHT 

TURN 

AID 

WIELD 

GAB 

ROUGE 

DUMP 

MAP 

HOSE 

STRESS 

RUG 

BOOK 

LEASH 

CLIFF 

FIFTH 

THRESH 

BARGE 

LAY 

DIN 

SHIEK 

PART 

HAD 

SANG 

KNEE 

HASH 


LOUSE 

PITCH 

PUMP 

CREWS 

TUCK 

TON 

ROCK 

SUIT 

DAME 

TIRE 

VOW 

SHEEP 

STAB 

INK 

SORE 

THREE 

DUB 

RYE 

CHEESE 

KIND 

NEXT 

CLOSED 

GAS 

DRAPE 

NAP 
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